Avoid index clone in next by scho-furiosa · Pull Request #1584 · rust-ndarray/ndarray

scho-furiosa · 2026-02-10T21:31:34Z

Hello, everyone.

My colleague @chaehhyun found that there are some regressions due to the clone in next during a profiling. We suspect ix.clone() inside Baseiter::next is the culprit, but not 100% for sure as of now. We are still experimenting its impact internally.

I have a couple of questions:

I am curious if the cloning of index in next is intentional.
If it is not the case, is there any pre-existing perf test in CI?

Thank you.

There is an avoidable `clone` in `next`. This commit introduces `next_for_mut`, which is similar to `next_for`, but updates the key in-place, and uses it in `next` instead.

akern40 · 2026-02-11T02:27:46Z

Hello! Sorry to hear about a regression. Can you provide a minimal working example of the regression? Or some additional debug information?

akern40 · 2026-02-11T02:28:21Z

Also, what version caused the regression?

scho-furiosa · 2026-02-11T04:52:49Z

Also, what version caused the regression?

We are using the version 0.15.6. That said, I do not think this issue is related to the version, since the line ix.clone() has been there since its initial commit 2014.

Can you provide a minimal working example of the regression? Or some additional debug information?

AFAIU, this can be problematic when there are many dimensions (probably bigger than 6, i.e. IxDyn), so the cloning of the key is expensive. Basically, I just think the cloning of the key is unnecessary for normal iteration.

Re the minimal example, I am still curious if there are perf tests in the repo, or if you want it to have. This "cloning more or less" is not about functional correctness, so it is unclear which type of minimal example you have in your mind.

akern40 · 2026-02-11T13:10:26Z

Ah thank you for the clarification. A "regression" is usually meant as a drop in performance or correctness when compared to a previous version, so I thought this was behavior you were seeing after an upgrade.

Unfortunately, performance is a known issue when using dynamic-dimensioned arrays. I haven't had a chance yet to look into the cause, but it's possible that this fix would help. I'd be happy to follow up and test to see if it improves speed. This is where a minimal example would help: the smallest possible code that exhibits the slow behavior that you're seeing.

There are some limited benchmarks in the library, but nothing comprehensive.

akern40 · 2026-02-11T13:15:08Z

These changes look good, I appreciate the contribution! I'll put together a small benchmark to ensure that it has a positive impact on performance (I'm 99% sure that it will) and then I will merge the changes.

nilgoyette · 2026-02-11T13:57:32Z

src/dimension/dimension_trait.rs

Why can't this function calls next_for_mut now?

let mut index = index; self.next_for_mut(&mut index)

scho-furiosa · 2026-02-12T00:01:44Z

Here is a small exmaple. The perf change does not seem to be dramatic though.

#[test]
fn test_iter_perf()
{
    let a = ArrayD::from_shape_vec(
        vec![
            2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
            2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
            2, 2, 2, 2,
        ],
        vec![3.0; 1024 * 1024 * 16],
    )
    .unwrap();
    for _x in a.reversed_axes().iter() {}
}

before:

~/ndarray$ cargo test -- --nocapture test_iter_perf 2> /dev/null | grep iter_perf -A 3
test test_iter_perf ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 11 filtered out; finished in 7.02s

~/ndarray$ cargo test -- --nocapture test_iter_perf 2> /dev/null | grep iter_perf -A 3
test test_iter_perf ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 11 filtered out; finished in 6.90s

~/ndarray$ cargo test -- --nocapture test_iter_perf 2> /dev/null | grep iter_perf -A 3
test test_iter_perf ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 11 filtered out; finished in 6.89s

after:

~/ndarray$ cargo test -- --nocapture test_iter_perf 2> /dev/null | grep iter_perf -A 3
test test_iter_perf ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 11 filtered out; finished in 5.60s

~/ndarray$ cargo test -- --nocapture test_iter_perf 2> /dev/null | grep iter_perf -A 3
test test_iter_perf ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 11 filtered out; finished in 5.40s

~/ndarray$ cargo test -- --nocapture test_iter_perf 2> /dev/null | grep iter_perf -A 3
test test_iter_perf ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 11 filtered out; finished in 5.39s

A "regression" is usually meant as a drop in performance or correctness when compared to a previous version

Sorry for the confusion. My bad. /o\

nilgoyette · 2026-02-12T19:56:05Z

@scho-furiosa Can you please fix the errors?

IIUC, this PR can only make ArrayD faster, it can't have a significant impact on other arrays? Because we copying N usize instead of allocating and copying a small vector?

scho-furiosa · 2026-02-12T22:41:14Z

Can you please fix the errors?

Thank you. I fixed the clippy errors, but I am not sure how to handle the miri-related one.

IIUC, this PR can only make ArrayD faster, it can't have a significant impact on other arrays? Because we copying N usize instead of allocating and copying a small vector?

Exactly that is my understanding. The index copies in Array1-6 should have been cheap regardless of this PR, since they are copy-able in Rust.

nilgoyette · 2026-02-13T14:34:25Z

LGTM, but I'll let @akern40 do its small benchmark.

akern40 · 2026-02-13T14:46:14Z

The last thing I want to see is the above ArrayD benchmark replicated for a fixed-rank array, say Array6. I see that there is a significant speedup from the change for ArrayD, but I want to understand what the "gap" was between ArrayD and Array6 and how that change has affected the gap.

scho-furiosa · 2026-02-13T21:13:15Z

but I want to understand what the "gap" was between ArrayD and Array6 and how that change has affected the gap.

Interesting point!

To precisely compare the times of iteration only, I added a timer.
For a fair comparison of ArrayD and Array6, revised the shape.

#[test]
fn test_iter_perfd()
{
    let a = ArrayD::from_shape_vec(vec![1024, 1024, 2, 2, 2, 2], vec![3.0; 1024 * 1024 * 16])
        .unwrap()
        .reversed_axes();

    let start = Instant::now();
    for _x in a.iter() {}
    let duration = start.elapsed();
    println!("Time elapsed is: {:?}", duration);
}

#[test]
fn test_iter_perf6()
{
    let a = Array6::from_shape_vec((1024, 1024, 2, 2, 2, 2), vec![3.0; 1024 * 1024 * 16])
        .unwrap()
        .reversed_axes();

    let start = Instant::now();
    for _x in a.iter() {}
    let duration = start.elapsed();
    println!("Time elapsed is: {:?}", duration);
}

Here is the result.

ArrayD before

Time elapsed is: 4.147015005s
Time elapsed is: 4.185004973s
Time elapsed is: 4.152269005s

Array6 before

Time elapsed is: 2.877281448s
Time elapsed is: 2.874772023s
Time elapsed is: 2.847792364s

======

ArrayD after

Time elapsed is: 2.870766284s
Time elapsed is: 2.859789082s
Time elapsed is: 2.862473005s


Array6 after

Time elapsed is: 2.761921391s
Time elapsed is: 2.771982213s
Time elapsed is: 2.749912549s

A significant speed up in ArrayD, about -31%.
A minor speed up in Array6, about -3%.
The gap between ArrayD and Array6 was x1.45 and now is x1.04 with this PR.

Avoid index clone in next

13b2736

There is an avoidable `clone` in `next`. This commit introduces `next_for_mut`, which is similar to `next_for`, but updates the key in-place, and uses it in `next` instead.

nilgoyette reviewed Feb 11, 2026

View reviewed changes

Rewrite next_for using next_for_mut

cc53dd2

Fix clippy error

d7e9561

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid index clone in next#1584

Avoid index clone in next#1584
scho-furiosa wants to merge 3 commits intorust-ndarray:masterfrom
scho-furiosa:no-clone

scho-furiosa commented Feb 10, 2026 •

edited

Loading

Uh oh!

akern40 commented Feb 11, 2026

Uh oh!

akern40 commented Feb 11, 2026

Uh oh!

scho-furiosa commented Feb 11, 2026

Uh oh!

akern40 commented Feb 11, 2026

Uh oh!

akern40 commented Feb 11, 2026

Uh oh!

nilgoyette Feb 11, 2026

Uh oh!

scho-furiosa commented Feb 12, 2026

Uh oh!

nilgoyette commented Feb 12, 2026

Uh oh!

scho-furiosa commented Feb 12, 2026 •

edited

Loading

Uh oh!

nilgoyette commented Feb 13, 2026

Uh oh!

akern40 commented Feb 13, 2026

Uh oh!

scho-furiosa commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

scho-furiosa commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akern40 commented Feb 11, 2026

Uh oh!

akern40 commented Feb 11, 2026

Uh oh!

scho-furiosa commented Feb 11, 2026

Uh oh!

akern40 commented Feb 11, 2026

Uh oh!

akern40 commented Feb 11, 2026

Uh oh!

nilgoyette Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

scho-furiosa commented Feb 12, 2026

Uh oh!

nilgoyette commented Feb 12, 2026

Uh oh!

scho-furiosa commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nilgoyette commented Feb 13, 2026

Uh oh!

akern40 commented Feb 13, 2026

Uh oh!

scho-furiosa commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scho-furiosa commented Feb 10, 2026 •

edited

Loading

scho-furiosa commented Feb 12, 2026 •

edited

Loading

scho-furiosa commented Feb 13, 2026 •

edited

Loading